Search CORE

21 research outputs found

Recommended from our members

Relating dominance formalisms

Author: Koller Alexander
Rambow Owen C.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2007
Field of study

We establish for the first time a formal relationship between dominance graphs, used for modeling semantics, and grammar formalisms with underspecified dominance links, used for modeling syntax. We present a translation of normal dominance graphs into Unordered Vector Grammars with Dominance Links (UVG-DL) and prove that the configurations of the dominance graph correspond to the derivation trees of the grammar. Moreover, the standard algorithms for both formalisms compute isomorphic charts

Columbia University Academic Commons

Recommended from our members

Grammar Approximation by Representative Sublanguage: A New Model for Language Learning

Author: Muresan Smarandan
Rambow Owen C.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2007
Field of study

We propose a new language learning model that learns a syntactic-semantic grammar from a small number of natural language strings annotated with their semantics, along with basic assumptions about natural language syntax. We show that the search space for grammar induction is a complete gram- mar lattice, which guarantees the uniqueness of the learned grammar

Columbia University Academic Commons

Recommended from our members

MADA+TOKAN Manual

Author: Habash Nizar
Habash Nizar Y.
Rambow Owen
Rambow Owen C.
Roth Ryan
Roth Ryan M.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2010
Field of study

MADA1 is a system for Morphological Analysis and Disambiguation for Arabic. TOKAN is a general tokenizer for MADA-disambigauted text. Internally, MADA also makes use of ALMORGEANA, an Arabic lexeme-based morphology analyzer

Columbia University Academic Commons

Arabic Diacritization through Full Morphological Tagging

Author: Habash Nizar Y.
Rambow Owen C.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2007
Field of study

We present a diacritization system for written Arabic which is based on a lexical resource. It combines a tagger and a lexeme language model. It improves on the best results reported in the literature

Crossref

Columbia University Academic Commons

Recommended from our members

VigNet: Grounding Language in Graphics using Frame Semantics

Author: Bauer Daniel
Coyne Robert Eric
Rambow Owen C.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2011
Field of study

This paper introduces Vignette Semantics, a lexical semantic theory based on Frame Semantics that represents conceptual and graphical relations. We also describe a lexical resource that implements this theory, VigNet, and its application in text-to-scene generation

Columbia University Academic Commons

Recommended from our members

Frame Semantics in Text-to-Scene Generation

Author: Coyne Robert Eric
Hirschberg Julia Bell
Rambow Owen C.
Sproat Richard
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2010
Field of study

3D graphics scenes are difficult to create, requiring users to learn and utilize a series of complex menus, dialog boxes, and often tedious direct manipulation techniques. By giving up some amount of control afforded by such interfaces we have found that users can use natural language to quickly and easily create a wide variety of 3D scenes. Natural language offers an interface that is intuitive and immediately accessible by anyone, without requiring any special skill or training. The WordsEye system (http://www.wordseye.com) has been used by several thousand users on the web to create over 10,000 scenes. The system relies on a large database of 3D models and poses to depict entities and actions. We describe how the current version of the system incorporates the type of lexical and real-world knowledge needed to depict scenes from language

Columbia University Academic Commons

Recommended from our members

Collecting Spatial Information for Locations in a Text-to-Scene Conversion System

Author: Bauer Daniel
Coyne Robert Eric
Rambow Owen C.
Rouhizadeh Masoud
Sproat Richard
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2011
Field of study

We investigate using Amazon Mechanical Turk (AMT) for building a low-level description corpus and populating VigNet, a comprehensive semantic resource that we will use in a text-to-scene generation system. To depict a picture of a location, VigNet should contain the knowledge about the typical objects in that location and the arrangements of those objects. Such information is mostly common-sense knowledge that is taken for granted by human beings and is not stated in existing lexical resources and in text corpora. In this paper we focus on collecting objects of locations using AMT. Our results show that it is a promising approach

Columbia University Academic Commons

Recommended from our members

Conventional Orthography for Dialectal Arabic (CODA): Principles and Guidelines -- Egyptian Arabic - Version 0.7 - March 2012

Author: Diab Mona T.
Habash Nizar Y.
Rambow Owen C.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2014
Field of study

This document introduces CODA (Conventional Orthography for Dialectal Arabic) and presents specifications and detailed guidelines for Egyptian Arabic CODA. CODA addresses the problem of inconsistent orthographic choices in raw (naturally occurring) written dialectal Arabic text. The specifications are a succinct summary, while the guidelines contain details and examples. The document has three parts that are ordered from most general to the more specific. In Part 1, we define CODA and present its general goals, principles and considerations in a non-dialect specific manner. In Part 2, we present a high level CODA specification for Egyptian Arabic (EGY). And in Part 3, we present detailed guidelines for EGY CODA

Columbia University Academic Commons

Recommended from our members

Parsing Arabic Dialects

Author: Chiang David
Diab Mona T.
Habash Nizar Y.
Hwa Rebecca
Lacey Vincent
Levy Roger
Nichols Carol
Rambow Owen C.
Shareef Safiullah
Sima'an Khalil
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2006
Field of study

The Arabic language is a collection of spoken dialects with important phonological, morphological, lexical, and syntactic differences, along with a standard written language, Modern Standard Arabic (MSA). Since the spoken dialects are not officially written, it is very costly to obtain adequate corpora to use for training dialect NLP tools such as parsers. In this paper, we address the problem of parsing transcribed spoken Levantine Arabic (LA). We do not assume the existence of any annotated LA corpus (except for development and testing), nor of a parallel corpus LA-MSA. Instead, we use explicit knowledge about the relation between LA and MSA

Columbia University Academic Commons